216 research outputs found

    NG-meta-profiler: fast processing of metagenomes using NGLess, a domain-specific language

    Get PDF
    Background: Shotgun metagenomes contain a sample of all the genomic material in an environment, allowing for the characterization of a microbial community. In order to understand these communities, bioinformatics methods are crucial. A common first step in processing metagenomes is to compute abundance estimates of different taxonomic or functional groups from the raw sequencing data. Given the breadth of the field, computational solutions need to be flexible and extensible, enabling the combination of different tools into a larger pipeline. Results: We present NGLess and NG-meta-profiler. NGLess is a domain specific language for describing next-generation sequence processing pipelines. It was developed with the goal of enabling user-friendly computational reproducibility. It provides built-in support for many common operations on sequencing data and is extensible with external tools with configuration files. Using this framework, we developed NG-meta-profiler, a fast profiler for metagenomes which performs sequence preprocessing, mapping to bundled databases, filtering of the mapping results, and profiling (taxonomic and functional). It is significantly faster than either MOCAT2 or htseq-count and (as it builds on NGLess) its results are perfectly reproducible. Conclusions: NG-meta-profiler is a high-performance solution for metagenomics processing built on NGLess. It can be used as-is to execute standard analyses or serve as the starting point for customization in a perfectly reproducible fashion. NGLess and NG-meta-profiler are open source software (under the liberal MIT license) and can be downloaded from https://ngless.embl.de or installed through bioconda

    Magnetocaloric Effect And Evidence Of Superparamagnetism In Gda L2 Nanocrystallites: A Magnetic-structural Correlation

    Get PDF
    The correlation between structural and magnetic properties of GdAl2, focusing on the role played by the disorder in magnetic ordering and how it influences the magnetocaloric effect (MCE) are discussed. Micrometric-sized particles, consisting of nanocrystallites embedded in an amorphous matrix, were prepared by a mechanical milling technique and characterized by means of x-ray diffraction, scanning and high-resolution transmission electron microscopy as well as magnetic measurements as a function of an applied external magnetic field and temperature. The results show that the average particle size is just slightly diminished (≈7%) with the milling time (between 3 and 13 h), whereas the average crystallite size undergoes an expressive reduction (≈43%). For long milling times, structural disorders mostly associated with crystallite size singularly affect the magnetic properties, leading to a large tablelike MCE in the temperature range between 30 and 165 K. Below 30 K, nanocrystallites with dimensions below a given critical size cause an enhancement in the magnetic entropy change related to superparamagnetic behavior. In contrast, for low milling times, relative cooling power values are improved. These striking features along with the small magnetic hysteresis observed make the milled GdAl2 a promising material for application in the magnetic refrigeration technology. Finally, a discussion in an attempt to elucidate the origin of the spin-glass states previously reported in the literature for mechanically milled GdAl2 samples for very long times (400 and 1000 h) is presented. © 2016 American Physical Society.93

    String Indexing for Patterns with Wildcards

    Get PDF
    We consider the problem of indexing a string tt of length nn to report the occurrences of a query pattern pp containing mm characters and jj wildcards. Let occocc be the number of occurrences of pp in tt, and σ\sigma the size of the alphabet. We obtain the following results. - A linear space index with query time O(m+σjloglogn+occ)O(m+\sigma^j \log \log n + occ). This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time Θ(jn)\Theta(jn) in the worst case. - An index with query time O(m+j+occ)O(m+j+occ) using space O(σk2nlogklogn)O(\sigma^{k^2} n \log^k \log n), where kk is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time. - A time-space trade-off, generalizing the index by Cole et al. [STOC 2004]. We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest

    MOCAT2: a metagenomic assembly, annotation and profiling framework

    Get PDF
    MOCAT2 is a software pipeline for metagenomic sequence assembly and gene prediction with novel features for taxonomic and functional abundance profiling. The automated generation and efficient annotation of non-redundant reference catalogs by propagating pre-computed assignments from 18 databases covering various functional categories allows for fast and comprehensive functional characterization of metagenomes. Availability and Implementation: MOCAT2 is implemented in Perl 5 and Python 2.7, designed for 64-bit UNIX systems and offers support for high-performance computer usage via LSF, PBS or SGE queuing systems; source code is freely available under the GPL3 license at http://mocat.embl.de. Contact: [email protected]

    GUNC: detection of chimerism and contamination in prokaryotic genomes

    Get PDF
    Genomes are critical units in microbiology, yet ascertaining quality in prokaryotic genomes remains a formidable challenge. We present GUNC (the Genome UNClutterer), a tool that accurately detects and quantifies genome chimerism based on the lineage homogeneity of individual contigs using a genome’s full complement of genes. GUNC complements existing approaches by targeting previously underdetected types of contamination: we conservatively estimate that 5.7% of genomes in GenBank, 5.2% in RefSeq, and 15-30% of pre-filtered ‘high quality’ metagenome-assembled genomes in recent studies are undetected chimeras. GUNC provides a fast and robust tool to substantially improve prokaryotic genome quality. Source code (GPLv3+): https://github.com/grp-bork/gun

    proGenomes2: an improved database for accurate and consistent habitat, taxonomic and functional annotations of prokaryotic genomes

    Get PDF
    Microbiology depends on the availability of annotated microbial genomes for many applications. Comparative genomics approaches have been a major advance, but consistent and accurate annotations of genomes can be hard to obtain. In addition, newer concepts such as the pan-genome concept are still being implemented to help answer biological questions. Hence, we present proGenomes2, which provides 87 920 high-quality genomes in a user-friendly and interactive manner. Genome sequences and annotations can be retrieved individually or by taxonomic clade. Every genome in the database has been assigned to a species cluster and most genomes could be accurately assigned to one or multiple habitats. In addition, general functional annotations and specific annotations of antibiotic resistance genes and single nucleotide variants are provided. In short, proGenomes2 provides threefold more genomes, enhanced habitat annotations, updated taxonomic and functional annotation and improved linkage to the NCBI BioSample database. The database is available at http://progenomes.embl.de/

    Subspecies in the global human gut microbiome

    Get PDF
    Population genomics of prokaryotes has been studied in depth in only a small number of primarily pathogenic bacteria, as genome sequences of isolates of diverse origin are lacking for most species. Here, we conducted a large-scale survey of population structure in prevalent human gut microbial species, sampled from their natural environment, with a culture-independent metagenomic approach. We examined the variation landscape of 71 species in 2,144 human fecal metagenomes and found that in 44 of these, accounting for 72% of the total assigned microbial abundance, single-nucleotide variation clearly indicates the existence of sub-populations (here termed subspecies). A single subspecies (per species) usually dominates within each host, as expected from ecological theory. At the global scale, geographic distributions of subspecies differ between phyla, with Firmicutes subspecies being significantly more geographically restricted. To investigate the functional significance of the delineated subspecies, we identified genes that consistently distinguish them in a manner that is independent of reference genomes. We further associated these subspecies-specific genes with properties of the microbial community and the host. For example, two of the three Eubacterium rectale subspecies consistently harbor an accessory pro-inflammatory flagellum operon that is associated with lower gut community diversity, higher host BMI, and higher blood fasting insulin levels. Using an additional 676 human oral samples, we further demonstrate the existence of niche specialized subspecies in the different parts of the oral cavity. Taken together, we provide evidence for subspecies in the majority of abundant gut prokaryotes, leading to a better functional and ecological understanding of the human gut microbiome in conjunction with its host

    Landscape of mobile genetic elements and their antibiotic resistance cargo in prokaryotic genomes

    Get PDF
    Prokaryotic Mobile Genetic Elements (MGEs) such as transposons, integrons, phages and plasmids, play important roles in prokaryotic evolution and in the dispersal of cargo functions like antibiotic resistance. However, each of these MGE types is usually annotated and analysed individually, hampering a global understanding of phylogenetic and environmental patterns of MGE dispersal. We thus developed a computational framework that captures diverse MGE types, their cargos and MGE-mediated horizontal transfer events, using recombinases as ubiquitous MGE marker genes and pangenome information for MGE boundary estimation. Applied to ∼84k genomes with habitat annotation, we mapped 2.8 million MGE-specific recombinases to six operational MGE types, which together contain on average 13% of all the genes in a genome. Transposable elements (TEs) dominated across all taxa (∼1.7 million occurrences), outnumbering phages and phage-like elements (<0.4 million). We recorded numerous MGE-mediated horizontal transfer events across diverse phyla and habitats involving all MGE types, disentangled and quantified the extent of hitchhiking of TEs (17%) and integrons (63%) with other MGE categories, and established TEs as dominant carriers of antibiotic resistance genes. We integrated all these findings into a resource (proMGE.embl.de), which should facilitate future studies on the large mobile part of genomes and its horizontal dispersal
    corecore